Processing Dialectal Arabic: Exploiting Variability and Similarity to Overcome Challenges and Discover Opportunities

نویسنده

  • Mona Diab
چکیده

We recently witnessed an exponential growth in dialectal Arabic usage in both textual data and speech recordings especially in social media. Processing such media is of great utility for all kinds of applications ranging from information extraction to social media analytics for political and commercial purposes to building decision support systems. Compared to other languages, Arabic, especially the informal variety, poses a significant challenge to natural language processing algorithms since it comprises multiple dialects, linguistic code switching, and a lack of standardized orthographies, to top its relatively complex morphology. Inherently, the problem of processing Arabic in the context of social media is the problem of how to handle resource poor languages. In this talk I will go over some of our insights to some of these problems and show how there is a silver lining where we can generalize some of our solutions to other low resource language contexts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Elaborating on the opinion of medical and nursing students of the Kurdistan University of Medical Sciences: Challenges and opportunities of virtual learning in focus

Introduction: Universities have also used online courses as a tool to establish lifelong learning among students. Lifelong learning has become part of the way of life due to the dynamic nature of modern society. The community's demand for lifelong learning will be supported by the growth of online learning courses. Universities can reduce the cost of education providers by developing distance l...

متن کامل

DALILA: The Dialectal Arabic Linguistic Learning Assistant

Dialectal Arabic (DA) poses serious challenges for Natural Language Processing (NLP). The number and sophistication of tools and datasets in DA are very limited in comparison to Modern Standard Arabic (MSA) and other languages. MSA tools do not effectively model DA which makes the direct use of MSA NLP tools for handling dialects impractical. This is particularly a challenge for the creation of...

متن کامل

SEeSAW - Similarity Exploiting Storage for Accelerating Analytics Workflows

The key to successful deployment of big data solutions lies in the timely distillation of meaningful information. This is made difficult by the mismatch between volume and velocity of data at scale and challenges posed by disparate speeds of IO, CPU, memory and communication links of data storage and processing systems. Instead of viewing storage as a bottleneck in this pipeline, we believe tha...

متن کامل

CODACT: Towards Identifying Orthographic Variants in Dialectal Arabic

Dialectal Arabic (DA) is the spoken vernacular for over 300M people worldwide. DA is emerging as the form of Arabic written in online communication: chats, emails, blogs, etc. However, most existing NLP tools for Arabic are designed for processing Modern Standard Arabic, a variety that is more formal and scripted. Apart from the genre variation that is a hindrance for any language processing, e...

متن کامل

Arabic Dialect Processing Tutorial

The existence of dialects for any language constitutes a challenge for NLP in general since it adds another set of variation dimensions from a known standard. The problem is particularly interesting and challenging in Arabic and its different dialects, where the diversion from the standard could, in some linguistic views, warrant a classification as different languages. This problem would not b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016